This data file contains count of public bikes rented at each hour in Seoul Bike Sharing System with the corresponding weather data and holidays information. It has 14 variables and 8760 observations. We are interested in using Rented.Bike.Count (a numeric variable) as our response variable and explore how other factors (3 categorical variables and several continuous numeric variables) affect the count of bikes rented at each hour. Among the other 13 variables which we plan to use as potential predictors, we know from intuition that some may have more importance than others, like temperature, humidity, wind speed, visibility, seasons, and holiday, etc.
The original data comes from http://data.seoul.go.kr. The holiday information comes from SOUTH KOREA PUBLIC HOLIDAYS. A clean version can be found at UCI Machine Learning Repository.
Attribute Information:
This data set is interesting to us both personally and business-wise. Recently we have seen a rise in the delivery, accessibility, and usage of regular and electric rental bikes. There are clear environmental, health, and economical benefits associated with the usage of bikes as a mode of transportation. We would like to find out what factors lead to an increase in number of bikes rented and what factors have inverse effect on using rental bikes. Learning about such factors can help a bike rental business manage its inventory and supply without any hindrance. It can also help cities plan accordingly due to an increase of bikers, e.g. opening up more bike lanes during certain days or seasons. Environmentally, we will have a better understanding of the feasibility of turning a city into a “bike city” or looking at alternative options if a city is not friendly to bikers due to harsh weather conditions.
The data file can be successfully loaded into R. We have printed out the structure and first few rows of the data file below.
The column names in the csv file contains measurement
units (like Wind speed (m/s), Solar Radiation (MJ/m2)) and
characters such as \(^\circ\) and %. We
load the data using cleaned up column names.
columns = c("Date","Rented.Bike.Count","Hour","Temperature","Humidity",
"Wind.Speed","Visibility","Dew.point.temperature",
"Solar.Radiation","Rainfall","Snowfall","Seasons","Holiday",
"Functioning.Day")
bike = read.csv("../data/SeoulBikeData.csv", col.names = columns)
str(bike)
## 'data.frame': 8760 obs. of 14 variables:
## $ Date : chr "01/12/2017" "01/12/2017" "01/12/2017" "01/12/2017" ...
## $ Rented.Bike.Count : int 254 204 173 107 78 100 181 460 930 490 ...
## $ Hour : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Temperature : num -5.2 -5.5 -6 -6.2 -6 -6.4 -6.6 -7.4 -7.6 -6.5 ...
## $ Humidity : int 37 38 39 40 36 37 35 38 37 27 ...
## $ Wind.Speed : num 2.2 0.8 1 0.9 2.3 1.5 1.3 0.9 1.1 0.5 ...
## $ Visibility : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 1928 ...
## $ Dew.point.temperature: num -17.6 -17.6 -17.7 -17.6 -18.6 -18.7 -19.5 -19.3 -19.8 -22.4 ...
## $ Solar.Radiation : num 0 0 0 0 0 0 0 0 0.01 0.23 ...
## $ Rainfall : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Snowfall : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Seasons : chr "Winter" "Winter" "Winter" "Winter" ...
## $ Holiday : chr "No Holiday" "No Holiday" "No Holiday" "No Holiday" ...
## $ Functioning.Day : chr "Yes" "Yes" "Yes" "Yes" ...
head(bike)
## Date Rented.Bike.Count Hour Temperature Humidity Wind.Speed Visibility
## 1 01/12/2017 254 0 -5.2 37 2.2 2000
## 2 01/12/2017 204 1 -5.5 38 0.8 2000
## 3 01/12/2017 173 2 -6.0 39 1.0 2000
## 4 01/12/2017 107 3 -6.2 40 0.9 2000
## 5 01/12/2017 78 4 -6.0 36 2.3 2000
## 6 01/12/2017 100 5 -6.4 37 1.5 2000
## Dew.point.temperature Solar.Radiation Rainfall Snowfall Seasons Holiday
## 1 -17.6 0 0 0 Winter No Holiday
## 2 -17.6 0 0 0 Winter No Holiday
## 3 -17.7 0 0 0 Winter No Holiday
## 4 -17.6 0 0 0 Winter No Holiday
## 5 -18.6 0 0 0 Winter No Holiday
## 6 -18.7 0 0 0 Winter No Holiday
## Functioning.Day
## 1 Yes
## 2 Yes
## 3 Yes
## 4 Yes
## 5 Yes
## 6 Yes
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
bike$Date = as.Date(bike$Date, '%d/%m/%Y')
bike$year = as.numeric(format(bike$Date, '%Y'))
bike$month = as.numeric(format(bike$Date, '%m'))
bike$wday = wday(bike$Date) # Assuming Week Starts on Sunday. 1 and 7 should be weekends
bike$weekend = ifelse(bike$wday == 1 | bike$wday ==7, "Yes", "No")
table(bike$year)
##
## 2017 2018
## 744 8016
table(bike$month)
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 744 672 744 720 744 720 744 744 720 744 720 744
table(bike$wday)
##
## 1 2 3 4 5 6 7
## 1248 1248 1248 1248 1248 1272 1248
bike$Seasons = as.factor(bike$Seasons)
bike$Holiday = as.factor(bike$Holiday)
bike$Functioning.Day = as.factor(bike$Functioning.Day)
bike$year = as.factor(bike$year)
bike$month = as.factor(bike$month)
bike$wday = as.factor(bike$wday)
bike$weekend = as.factor(bike$weekend)
str(bike)
## 'data.frame': 8760 obs. of 18 variables:
## $ Date : Date, format: "2017-12-01" "2017-12-01" ...
## $ Rented.Bike.Count : int 254 204 173 107 78 100 181 460 930 490 ...
## $ Hour : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Temperature : num -5.2 -5.5 -6 -6.2 -6 -6.4 -6.6 -7.4 -7.6 -6.5 ...
## $ Humidity : int 37 38 39 40 36 37 35 38 37 27 ...
## $ Wind.Speed : num 2.2 0.8 1 0.9 2.3 1.5 1.3 0.9 1.1 0.5 ...
## $ Visibility : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 1928 ...
## $ Dew.point.temperature: num -17.6 -17.6 -17.7 -17.6 -18.6 -18.7 -19.5 -19.3 -19.8 -22.4 ...
## $ Solar.Radiation : num 0 0 0 0 0 0 0 0 0.01 0.23 ...
## $ Rainfall : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Snowfall : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Seasons : Factor w/ 4 levels "Autumn","Spring",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Holiday : Factor w/ 2 levels "Holiday","No Holiday": 2 2 2 2 2 2 2 2 2 2 ...
## $ Functioning.Day : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ year : Factor w/ 2 levels "2017","2018": 1 1 1 1 1 1 1 1 1 1 ...
## $ month : Factor w/ 12 levels "1","2","3","4",..: 12 12 12 12 12 12 12 12 12 12 ...
## $ wday : Factor w/ 7 levels "1","2","3","4",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ weekend : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
bike$Seasons.Sp = 1 * as.numeric(bike$Seasons == "Spring")
bike$Seasons.Su = 1 * as.numeric(bike$Seasons == "Summer")
bike$Seasons.Fa = 1 * as.numeric(bike$Seasons == "Autumn")
bike$Seasons.Wn = 1 * as.numeric(bike$Seasons == "Winter")
bike$Holiday.Yes = 1 * as.numeric(bike$Holiday == "Holiday")
bike$Functioning.Day.Yes = 1 * as.numeric(bike$Functioning.Day == "Yes")
bike$weekend.Yes = 1 * as.numeric(bike$weekend == "Yes")
bike_num = subset(bike, select = -c(Date, Seasons, Holiday, Functioning.Day, year, month, wday, weekend) )
pairs(bike_num)
cor(bike_num)
## Rented.Bike.Count Hour Temperature Humidity
## Rented.Bike.Count 1.00000 4.103e-01 0.538558 -0.19978
## Hour 0.41026 1.000e+00 0.124114 -0.24164
## Temperature 0.53856 1.241e-01 1.000000 0.15937
## Humidity -0.19978 -2.416e-01 0.159371 1.00000
## Wind.Speed 0.12111 2.852e-01 -0.036252 -0.33668
## Visibility 0.19928 9.875e-02 0.034794 -0.54309
## Dew.point.temperature 0.37979 3.054e-03 0.912798 0.53689
## Solar.Radiation 0.26184 1.451e-01 0.353505 -0.46192
## Rainfall -0.12307 8.715e-03 0.050282 0.23640
## Snowfall -0.14180 -2.152e-02 -0.218405 0.10818
## Seasons.Sp 0.02289 0.000e+00 0.007960 0.01569
## Seasons.Su 0.29655 0.000e+00 0.665846 0.19259
## Seasons.Fa 0.10275 0.000e+00 0.059728 0.02837
## Seasons.Wn -0.42493 0.000e+00 -0.738720 -0.23830
## Holiday.Yes -0.07234 2.642e-22 -0.055931 -0.05028
## Functioning.Day.Yes 0.20394 5.439e-03 -0.050170 -0.02080
## weekend.Yes -0.03647 0.000e+00 0.007214 -0.01695
## Wind.Speed Visibility Dew.point.temperature
## Rented.Bike.Count 0.121108 0.199280 0.379788
## Hour 0.285197 0.098753 0.003054
## Temperature -0.036252 0.034794 0.912798
## Humidity -0.336683 -0.543090 0.536894
## Wind.Speed 1.000000 0.171507 -0.176486
## Visibility 0.171507 1.000000 -0.176630
## Dew.point.temperature -0.176486 -0.176630 1.000000
## Solar.Radiation 0.332274 0.149738 0.094381
## Rainfall -0.019674 -0.167629 0.125597
## Snowfall -0.003554 -0.121695 -0.150887
## Seasons.Sp 0.083855 -0.187498 0.002056
## Seasons.Su -0.064698 0.061958 0.652378
## Seasons.Fa -0.128009 0.117413 0.062878
## Seasons.Wn 0.109186 0.008616 -0.722366
## Holiday.Yes 0.023017 0.031773 -0.066759
## Functioning.Day.Yes 0.005037 -0.026000 -0.052837
## weekend.Yes -0.022227 -0.026762 -0.006990
## Solar.Radiation Rainfall Snowfall Seasons.Sp Seasons.Su
## Rented.Bike.Count 0.261837 -0.123074 -0.141804 0.022888 0.296549
## Hour 0.145131 0.008715 -0.021516 0.000000 0.000000
## Temperature 0.353505 0.050282 -0.218405 0.007960 0.665846
## Humidity -0.461919 0.236397 0.108183 0.015694 0.192595
## Wind.Speed 0.332274 -0.019674 -0.003554 0.083855 -0.064698
## Visibility 0.149738 -0.167629 -0.121695 -0.187498 0.061958
## Dew.point.temperature 0.094381 0.125597 -0.150887 0.002056 0.652378
## Solar.Radiation 1.000000 -0.074290 -0.072301 0.079974 0.128402
## Rainfall -0.074290 1.000000 0.008500 0.017595 0.053928
## Snowfall -0.072301 0.008500 1.000000 -0.099785 -0.099785
## Seasons.Sp 0.079974 0.017595 -0.099785 1.000000 -0.336996
## Seasons.Su 0.128402 0.053928 -0.099785 -0.336996 1.000000
## Seasons.Fa -0.031374 -0.013247 -0.024742 -0.334548 -0.334548
## Seasons.Wn -0.178420 -0.058755 0.225875 -0.332099 -0.332099
## Holiday.Yes -0.005077 -0.014269 -0.012591 -0.044791 -0.073932
## Functioning.Day.Yes -0.007665 0.002055 0.032089 0.038413 0.108370
## weekend.Yes 0.012975 -0.014151 -0.006759 -0.002987 -0.002987
## Seasons.Fa Seasons.Wn Holiday.Yes Functioning.Day.Yes
## Rented.Bike.Count 0.1027530 -0.424925 -7.234e-02 0.203943
## Hour 0.0000000 0.000000 2.642e-22 0.005439
## Temperature 0.0597283 -0.738720 -5.593e-02 -0.050170
## Humidity 0.0283665 -0.238295 -5.028e-02 -0.020800
## Wind.Speed -0.1280093 0.109186 2.302e-02 0.005037
## Visibility 0.1174133 0.008616 3.177e-02 -0.026000
## Dew.point.temperature 0.0628783 -0.722366 -6.676e-02 -0.052837
## Solar.Radiation -0.0313743 -0.178420 -5.077e-03 -0.007665
## Rainfall -0.0132466 -0.058755 -1.427e-02 0.002055
## Snowfall -0.0247422 0.225875 -1.259e-02 0.032089
## Seasons.Sp -0.3345477 -0.332099 -4.479e-02 0.038413
## Seasons.Su -0.3345477 -0.332099 -7.393e-02 0.108370
## Seasons.Fa 1.0000000 -0.329686 1.498e-02 -0.253718
## Seasons.Wn -0.3296859 1.000000 1.046e-01 0.106795
## Holiday.Yes 0.0149846 0.104557 1.000e+00 -0.027624
## Functioning.Day.Yes -0.2537183 0.106795 -2.762e-02 1.000000
## weekend.Yes 0.0009994 0.005016 -3.164e-02 0.040733
## weekend.Yes
## Rented.Bike.Count -0.0364674
## Hour 0.0000000
## Temperature 0.0072144
## Humidity -0.0169510
## Wind.Speed -0.0222268
## Visibility -0.0267619
## Dew.point.temperature -0.0069896
## Solar.Radiation 0.0129755
## Rainfall -0.0141509
## Snowfall -0.0067586
## Seasons.Sp -0.0029873
## Seasons.Su -0.0029873
## Seasons.Fa 0.0009994
## Seasons.Wn 0.0050156
## Holiday.Yes -0.0316417
## Functioning.Day.Yes 0.0407333
## weekend.Yes 1.0000000
plot(Rented.Bike.Count ~ Hour, data = bike)
plot(Rented.Bike.Count ~ Temperature, data = bike)
plot(Rented.Bike.Count ~ Humidity, data = bike)
plot(Rented.Bike.Count ~ Wind.Speed, data = bike)
plot(Rented.Bike.Count ~ Visibility, data = bike)
plot(Rented.Bike.Count ~ Dew.point.temperature, data = bike)
plot(Rented.Bike.Count ~ Solar.Radiation, data = bike)
plot(Rented.Bike.Count ~ Rainfall, data = bike)
plot(Rented.Bike.Count ~ Snowfall, data = bike)
plot(Rented.Bike.Count ~ Seasons, data = bike)
plot(Rented.Bike.Count ~ Holiday, data = bike)
plot(Rented.Bike.Count ~ Functioning.Day, data = bike)
plot(Rented.Bike.Count ~ wday, data = bike)
plot(Rented.Bike.Count ~ weekend, data = bike)
plot(Rented.Bike.Count ~ month, data = bike)
model = lm(Rented.Bike.Count ~ . - Date, data = bike)
summary(model)
##
## Call:
## lm(formula = Rented.Bike.Count ~ . - Date, data = bike)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1314.3 -264.9 -49.7 205.0 1973.0
##
## Coefficients: (12 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -329.2616 99.4197 -3.31 0.00093 ***
## Hour 26.8242 0.7099 37.78 < 2e-16 ***
## Temperature 20.7410 3.5562 5.83 5.7e-09 ***
## Humidity -9.9889 0.9913 -10.08 < 2e-16 ***
## Wind.Speed 21.2107 4.8782 4.35 1.4e-05 ***
## Visibility 0.0607 0.0110 5.53 3.3e-08 ***
## Dew.point.temperature 10.9079 3.7343 2.92 0.00350 **
## Solar.Radiation -90.2572 7.2829 -12.39 < 2e-16 ***
## Rainfall -58.7891 4.0731 -14.43 < 2e-16 ***
## Snowfall 33.0677 10.7776 3.07 0.00216 **
## SeasonsSpring -0.5662 25.1718 -0.02 0.98206
## SeasonsSummer -457.2517 35.3072 -12.95 < 2e-16 ***
## SeasonsWinter -294.9597 25.4463 -11.59 < 2e-16 ***
## HolidayNo Holiday 138.8664 20.8072 6.67 2.6e-11 ***
## Functioning.DayYes 959.0631 25.5678 37.51 < 2e-16 ***
## year2018 -67.5897 21.7407 -3.11 0.00188 **
## month2 -35.4612 22.2289 -1.60 0.11069
## month3 -206.3619 24.7258 -8.35 < 2e-16 ***
## month4 -132.3871 22.8883 -5.78 7.5e-09 ***
## month5 NA NA NA NA
## month6 571.5151 23.8202 23.99 < 2e-16 ***
## month7 181.1382 21.5260 8.41 < 2e-16 ***
## month8 NA NA NA NA
## month9 -75.9409 29.4118 -2.58 0.00984 **
## month10 80.9523 23.6964 3.42 0.00064 ***
## month11 NA NA NA NA
## month12 NA NA NA NA
## wday2 80.3870 16.5340 4.86 1.2e-06 ***
## wday3 100.7829 16.6301 6.06 1.4e-09 ***
## wday4 125.2318 16.5884 7.55 4.8e-14 ***
## wday5 106.1122 16.5592 6.41 1.6e-10 ***
## wday6 138.7252 16.4494 8.43 < 2e-16 ***
## wday7 67.2491 16.5305 4.07 4.8e-05 ***
## weekendYes NA NA NA NA
## Seasons.Sp NA NA NA NA
## Seasons.Su NA NA NA NA
## Seasons.Fa NA NA NA NA
## Seasons.Wn NA NA NA NA
## Holiday.Yes NA NA NA NA
## Functioning.Day.Yes NA NA NA NA
## weekend.Yes NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 412 on 8731 degrees of freedom
## Multiple R-squared: 0.594, Adjusted R-squared: 0.593
## F-statistic: 457 on 28 and 8731 DF, p-value: <2e-16